Specific Question: How does station infrastructure relate to e-bike vs classic bike usage?
Key Finding: High-infrastructure areas show lower e-bike usage—the opposite of what you might expect. This suggests e-bike availability is constrained by demand, not infrastructure quality.
# Save station classificationsaveRDS(stations, "data/processed/stations_infrastructure.rds")
Visualization 1: Infrastructure Map
Show the code
ggplot() +geom_sf(data = manhattan_poly, fill ="gray95", color ="gray60", size =0.3) +geom_sf(data = manhattan_routes_sf, color ="lightblue", alpha =0.3, size =0.5) +geom_point(data = stations, aes(x = lng, y = lat, color = infrastructure_level, size = nearby_count),alpha =0.7) +scale_color_manual(values =c("Low"="#F44336", "Medium"="#FF9800", "High"="#4CAF50"),name ="Infrastructure Level" ) +scale_size_continuous(name ="Station Density", range =c(1, 4)) +labs(title ="Citi Bike Station Infrastructure",subtitle ="Stations classified by bike lane proximity and station density",caption ="Size indicates number of nearby stations within 500m" ) +theme_minimal() +theme(panel.grid =element_blank(),axis.text =element_blank(),axis.title =element_blank(),legend.position ="right",plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10, color ="gray40") )
Figure 1: Station Infrastructure Classification
E-bike Usage Analysis
Show the code
# Calculate e-bike % by stationstation_variation <- citibike_sample[, .(ebike_pct =mean(rideable_type =="electric_bike") *100,trips = .N,lat =mean(start_lat),lng =mean(start_lng)), by = start_station_name]# Merge with infrastructurestation_full <-merge(station_variation, stations[, .(station_name, infrastructure_level, infrastructure_score)],by.x ="start_station_name", by.y ="station_name",all.x =TRUE)# Calculate bike balance (relative to average)overall_ebike_share <-mean(citibike_sample$rideable_type =="electric_bike") *100station_full[, bike_balance := ebike_pct - overall_ebike_share]cat("Overall e-bike usage:", round(overall_ebike_share, 1), "%\n")
Overall e-bike usage: 68 %
Visualization 2: Infrastructure vs E-bike Usage
Show the code
ggplot(station_full[!is.na(infrastructure_score) & trips >=50& ebike_pct >=40], aes(x = infrastructure_score, y = ebike_pct)) +geom_point(aes(color = infrastructure_level, size = trips), alpha =0.6) +geom_smooth(method ="lm", color ="black", linetype ="dashed", se =TRUE) +scale_color_manual(values =c("Low"="#F44336", "Medium"="#FF9800", "High"="#4CAF50"),name ="Infrastructure Level") +scale_size_continuous(range =c(1, 6), name ="Total Trips") +labs(title ="Infrastructure Score vs. E-bike Usage",subtitle ="Each dot is a station | Dashed line shows overall trend",x ="Infrastructure Score (higher = better infrastructure)",y ="E-bike Usage (%)" ) +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"))
Figure 2: Infrastructure Score vs E-bike Usage
Key Finding: The trend line slopes downward—as infrastructure score increases, e-bike usage decreases.
Visualization 3: Bike Type Balance Map
Show the code
ggplot() +geom_sf(data = manhattan_poly, fill ="gray95", color ="gray60", linewidth =0.3) +geom_point(data = station_full[trips >=50],aes(x = lng, y = lat, color = bike_balance, size = trips),alpha =0.9) +scale_color_gradientn(colors =c("#08306b", "#2171b5", "#6baed6", "#f7f7f7", "#fcbba1", "#fb6a4a", "#cb181d"),values = scales::rescale(c(-60, -30, -10, 0, 10, 25, 40)),limits =c(-60, 40),breaks =c(-30, 0, 20),labels =c("More classic", "Near avg", "More e-bikes"),name ="Relative E-bike Share" ) +scale_size_continuous(range =c(2, 7), name ="Station\nTrip Volume", labels = scales::comma) +labs(title ="Where Are Stations More or Less E-bike-Heavy?",subtitle =sprintf("Color shows difference from Manhattan average (~%.0f%% e-bikes): orange = higher, blue = lower", overall_ebike_share),caption ="Data: Citi Bike Manhattan trips | Stations with 50+ trips shown" ) +coord_sf(datum =NA) +theme_minimal() +theme(panel.grid =element_blank(),axis.text =element_blank(),axis.title =element_blank(),legend.position ="right",plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10, color ="gray40") )
Figure 3: Bike Type Balance Across Manhattan
Geographic Pattern:
Lower Manhattan (high infrastructure) = Blue = More classic bike usage
Upper Manhattan (low infrastructure) = Orange = More e-bike usage
ggplot(ebike_full, aes(x = time_period, y = volume_group, fill = ebike_pct)) +geom_tile(color ="white", size =1.5) +geom_text(aes(label =paste0(round(ebike_pct, 1), "%")),size =4.5, fontface ="bold", color ="white") +facet_wrap(~rider_label) +scale_fill_gradient2(low ="#3498db",mid ="#95a5a6",high ="#e74c3c",midpoint =68,name ="E-bike\nUsage",limits =c(60, 80),breaks =seq(60, 80, 5),labels =function(x) paste0(x, "%") ) +labs(title ="E-bike Usage Patterns Across Multiple Dimensions",subtitle ="Blue = More classic bikes | Red = More e-bikes | Overall average = 68% e-bikes",x ="Time of Day",y ="Station Activity Level",caption ="Rider Type Distribution: Casual – 22% | Member – 78%" ) +theme_minimal(base_size =13) +theme(axis.text.x =element_text(angle =45, hjust =1, size =11),axis.text.y =element_text(size =11),strip.text =element_text(size =14, face ="bold"),plot.title =element_text(size =16, face ="bold", hjust =0),plot.subtitle =element_text(size =11, color ="gray30"),plot.caption =element_text(hjust =0, size =9),panel.grid =element_blank(),legend.position ="right" )
Figure 4: E-bike Usage Patterns Across Multiple Dimensions
Key Patterns:
Highest e-bike usage (78.8%): Casual riders at low-traffic stations during evening rush
Lowest e-bike usage (63.5%): Members at high-traffic stations during morning rush
Gap: 15 percentage points between best and worst conditions
Higher e-bike usage at low-traffic stations and during evening rush, suggesting riders prefer the easier ride home after a long day, and e-bikes are available to take.
Conclusion
Answer to Research Question:
Station infrastructure has an inverse relationship with e-bike usage—high-infrastructure areas show lower e-bike rates, not higher.
Infrastructure Level
E-bike Usage
Low
~72%
Medium
~72%
High
~67%
Explanation:
High-infrastructure areas attract more riders. E-bikes are preferred, so they get taken first. By the time many riders arrive, only classic bikes remain.
Connection to Overarching Question:
For bike type choice, infrastructure affects availability indirectly through demand:
Infrastructure effect (indirect): High-infrastructure areas create high demand, which depletes e-bikes
External factors (direct): Casual riders prefer e-bikes more than members; evening riders use e-bikes more than morning riders
Infrastructure doesn’t directly determine bike type preference—but it shapes availability by concentrating demand.